eye tracking

=user interface

Eye tracking interfaces are potentially very nice to use. What are their advantages and limitations? What holds their implementation back, and how could those issues be resolved?

Looking at things is easy and fast, more so than pointing at them using a mouse or trackpad, and doesn't require extra practice.

Also, your eyes moving to look at things happens slightly before you become consciously aware of looking at them. When used properly in an interface, this gives the impression of the computer anticipating your intention, not in a creepy way, but in a satisfying way, like you're a surgeon and a nurse puts the right tool in your hand as you need it.

But, of course, there are some problems.

When you look at something, that doesn't mean your eyes are pointed directly at that thing; it just means that thing is in your foveal vision. So, even with perfect tracking, it's only possible to get within perhaps 1° of the intended point. Many buttons on interfaces designed for use with a mouse are smaller than this, and of course, tracking is never perfect; 2° error is more realistic for a consumer system.

Of course, it's possible to design applications in ways where that's good enough. You "just" need to redesign all your application interfaces. How much did development of all mobile operating systems and apps cost? It happened, but it wasn't cheap.

Also, continuing to look at the same thing doesn't mean your eyes are stationary: constant unconscious microsaccades are happening. There are typically 3 to 6 fixations per second. To get good tracking performance, cameras must have both high resolution and moderately high framerates, which has been a relatively expensive combination, but the costs of cameras have come down.

High resolution with high framerate also requires good lighting, because each pixel must receive a certain number of photons each frame. Fortunately, eyes reflect infrared light, so if you put an IR LED next to your camera and an IR filter on the camera, the pupil position is fairly visible. Most eye trackers do that.

In theory, you could use facial recognition with a low resolution camera to target another camera at an eye. But a camera that mechanically moves to track users costs money, can make noise, and can be kind of creepy for users. A more common approach is to attach the eye tracking camera and IR light to the user's head.

VR headsets already involve the user wearing something, and they need high resolution and high refresh rates for a good experience, so they seem like a perfect application for foveated rendering, but there's an obvious question: if the display is supposed to fill the user's vision, where do you put the camera? It is possible to split the light by separating out IR or using half-silvered mirrors, but that obviously costs money and increases weight. Still, eye trackers for VR headsets are being actively pursued, not as an interface system, but for foveated rendering.

The eye tracking accuracy needed for foveated rendering is substantially lower than what's needed for a good interface, so an affordable fixed camera on a computer monitor is good enough for that. As such, I don't think that foveated rendering is necessarily tied to VR.

Using eye tracking for camera control can be problematic. Suppose someone is playing a FPS where aiming is linked to camera control, and you want to use eye tracking for it. What happens when someone looks at a target that's off the center? You could decouple aiming from camera control, with a mouse controlling the camera and eye tracking controlling aim, but this requires modifying the game software. You could press a button to jump to the current eye position, but there are 2 issues with that which I already brought up, one of which you might not have noticed. One is the inherent inaccuracy present in eye tracking as an interface. Can you guess what the other one was?

"Your eyes moving to look at things happens slightly before you become consciously aware of looking at them." By the time you press a button to move to what you "are" looking at, you might already be looking at something else. It's certainly possible to add a slight delay, but the correct delay length can vary somewhat. This issue obviously isn't limited to FPS games: it's present any time you want the user to press a button to do an action with the item they're looking at. It can be mostly solved, but high framerate and some intelligent processing are necessary.

The bad news is that useful eye tracking interface development seems to be a fairly large and expensive project when you consider the redesign of software necessary. But that's also good news for companies like Microsoft and Apple pursuing eye tracking: the difficulties involved are a "moat" that could allow them to recoup investment in that.